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Abstract 

Background: De novo retrotransposition of Alu elements has been recognized as a major driver for insertion 
polymorphisms in human populations. In this study, we exploited Alu-anchored bisulfite PCR libraries to identify 
evolutionarily recent Alu element insertions, and to investigate their genetic and epigenetic variation. 

Results: A total of 327 putatively recent Alu insertions were identified, altogether represented by 1,762 sequence 
reads. Nearly all such de novo retrotransposition events (316/327) were novel. Forty-seven out of forty-nine 
randomly selected events, corresponding to nineteen genomic loci, were sequence-verified. Alu element insertions 
remained hemizygous in one or more individuals in sixteen of the nineteen genomic loci. The Alu elements were 
found to be enriched for young Alu families with characteristic sequence features, such as the presence of a 
longer poly(A) tail. In addition, we documented the occurrence of a duplication of the AT-rich target site in their 
immediate flanking sequences, a hallmark of retrotransposition. Furthermore, we found the sequence motif (TT/ 
AAAA) that is recognized by the 0RF2P protein encoded by LINE-1 in their 5'-flanking regions, consistent with the 
fact that Alu retrotransposition is facilitated by LINE-1 elements. While most of these Alu elements were heavily 
methylated, we identified an Alu localized 1.5 kb downstream of T0MM5 that exhibited a completely 
unmethylated left arm. Interestingly, we observed differential methylation of its immediate 5' and 3' flanking CpG 
dinucleotides, in concordance with the unmethylated and methylated statuses of its internal 5' and 3' sequences, 
respectively. Importantly, TOMMS's CpG island and the 3 Alu repeats and 1 MIR element localized upstream of this 
newly inserted Alu were also found to be unmethylated. Methylation analyses of two additional genomic loci 
revealed no methylation differences in CpG dinucleotides flanking the Alu insertion sites in the two homologous 
chromosomes, irrespective of the presence or absence of the insertion. 

Conclusions: We anticipate that the combination of methodologies utilized in this study, which included repeat- 
anchored bisulfite PCR sequencing and the computational analysis pipeline herein reported, will prove invaluable 
for the generation of genetic and epigenetic variation maps. 



Background 

Repetitive elements constitute over 50% of the human 
genomic sequence [1]. The most prevalent repeats are 
the Alu family of SINEs, which comprise approximately 
10% of the human genome. A typical Alu element is 
approximately 300 bp long and contains two almost 
identical arms separated by an A-rich sequence. The 
ancestor of the Alu monomer is the 7 SL RNA gene, 
which encodes the RNA component of the signal recog- 
nition particle (SRP) that is involved in the translocation 
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of newly synthesized proteins [2,3]. Similar to the 7 SL 
gene, Alu elements with intact promoters - namely A 
and B boxes - may be transcribed by RNA polymerase 
III [2,4]. With the aid of the LINE-encoded retrotran- 
sposition machinery, Alu transcripts gain mobility and 
expand in genomes through a process involving reverse 
transcription and integration [5]. 

Alu retrotransposition has been an important molecu- 
lar evolutionary force reshaping the primate genomes 
[6]. The expansion of the Alu elements in the primate 
genomes is dated at least 60 million years ago [7]. Based 
on their evolutionary history, Alu elements can be clas- 
sified in three major subfamilies: AluJ, AluS, and AluY 
[8]. Among them, the youngest Alu elements - AluY 
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and its variants AluYa-g - remain very active, and exhi- 
bit the highest rate of retrotransposition in the human 
genome [9-12]. While several recent studies have shown 
that LINE-1 elements contribute substantially to the 
structural variations observed in the human genome 
[13-15], the retrotransposition rate of Alu elements is 
ten times higher than that of LINE-1, with an estimated 
new insertion at every 21 births [16]. 

Decades of research have demonstrated that Alu ele- 
ments play important roles in the genome and tran- 
scriptome [17-20]. Alu elements may contribute a large 
number of transcription factor binding sites [21], some 
of which may serve as enhancers involved in tissue 
development [22,23]. In addition, some Alu elements 
may be expressed and Alu transcription affects nearby 
gene expression, distal gene expression, and global 
translation. For instance, the expression of an Alu in the 
promoter of an epsilon-globin gene was found to nega- 
tively regulate globin gene expression by transcriptional 
interference [24]. Recently, Alu RNA was found to be a 
modular transacting repressor of mRNA transcription 
[25]. Interestingly, such transcriptional suppression was 
found to be specific and limited to certain genes. Alu 
RNAs also affect translational initiation and were found 
to form stable, discrete complexes with the double- 
stranded RNA-activated kinase PKR, and to antagonize 
PKR activation [26]. Transcription derepression of 
otherwise active Alu elements, which so often reside 
within genes, may lead to the formation of double- 
strand RNA - if in antisense orientation - and ultimately 
to heterochromatinization and silencing of the gene 
[27]. 

One of the key mechanisms controlling Alu expres- 
sion is DNA methylation. The human genome has 
approximately 28 million CpG dinucleotides, 7 million 
of which are found within Alu elements [1]. In most 
somatic tissues, the CpG dinucleotides within the Alu 
sequence are heavily methylated to suppress Alu expres- 
sion [28,29]. It has been demonstrated that the A and B 
boxes (5-16 bp, and 75-84 bp from the 5' terminus, 
respectively) are critical cis-elements for Alu expression. 
In particular, methylation of the B box is thought to 
inhibit protein binding and hence block Alu transcrip- 
tion [30]. Albeit not sufficient, demethylation and conse- 
quently transcription of Alu elements is required for 
occurrence of de novo retrotransposition [28]. Methy- 
lated CpGs can undergo deamination and thereby lead 
to mutations that render them unable for retrotransposi- 
tion [8,9]. 

Although much effort has been made to identify struc- 
tural variations resulting from Alu integration, much 
less is known with regard to the epigenetic status of 
newly inserted elements and of their flanking genomic 
sequences. Here we report the utilization of an Alu- 



anchored bisulfite PGR strategy to generate methylation 
maps for thousands of Alu elements in human cerebel- 
lum and in ependymomas [31,32]. In this approach, 
most of the targeted Alu elements are members of the 
active AluY subfamilies. In this study, we analyzed the 
aforementioned datasets to identify newly integrated Alu 
elements, to investigate sequence characteristics and 
commonalities of their integration sites, to uncover their 
methylation statuses, and to determine whether the 
methylation patterns of the sequences surrounding their 
integration sites would be altered in the alleles harbor- 
ing the insertion in individuals hemizygous for the Alu 
retrotransposon. 

Results 

Identification of recent Alu insertions 

The method developed by Xie and colleagues was initi- 
ally designed to generate a methylation map for a subset 
of young Alu elements [32]. The strategy applied a pri- 
mer targeting CpG-rich Alu repeats to simultaneously 
amplify thousands of Alu elements and their 5' flanking 
sequences. Unequivocal mapping of these repeats was 
therefore achieved through their - most often unique - 
5' flanking sequences. Eight Alu libraries were derived 
with this strategy, six from ependymomas and two from 
normal brain tissues [31,32]. 

In previous studies, a number of sequence reads from 
these libraries could only be partially mapped to the 
human reference sequence. In order to determine 
whether any of these sequence reads corresponded to a 
novel Alu integration event, i.e. one that was not yet 
documented in the UCSC database, we designed a com- 
putational pipeline to reanalyze these datasets (Figure 
1). For 158,591 sequence reads partially mapped in pre- 
vious studies, we first masked Alu sequences and then 
selected the ones containing at least 40 bp of 5' flanking 
sequences. A total of 24,820 sequence reads were thus 
identified. The Alu flanking sequences were then 
extracted from these reads and subjected to Megablast 
against in silico bisulfite converted human reference 
genome sequence. Unambiguous mapping was achieved 
for 8,738 sequence reads. As expected, the majority of 
these reads (79.8%) mapped to genomic sequences adja- 
cent to an Alu element. Further examination of the 
remaining 1,762 sequences reads (Additional File 1, 
Table SI) enabled their grouping into 327 clusters 
according to their genomic coordinates (Additional File 
2, Table S2). It is noteworthy that due to the highly 
stringent mapping criteria applied in our previous stu- 
dies [31,32], a few mismatches in the alignments 
between the reference genomic sequence and the 
sequences generated from the Alu libraries were suffi- 
cient to lead to their classification as "partially mapped" 
reads. 
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158,591 sequences partially mapped 
in previous studies 



24,820 sequences containingat least 40bp 
non-Alu sequences 



8,738 sequences unannbiguously nnapped to in 
S/7/C0 bisulfite converted reference genome 



1,762 sequences (327 clusters) with a putative insertion 
site at least 10 bp from the most adjacent Alu 



316 clusters identified as putative novel Alu 
integrations (not present in either dbSNP or dbRIP) 

Figure 1 Computational pipeline developed to identify putative Alu insertions from Alu anchored bisulfite PCR libraries 



We examined the distribution of the 327 clusters 
comprised of 1,762 sequences reads in eight Alu- 
anchored bisulfite PCR libraries (Table 1). Out of 327 
clusters, 163 clusters (49.8%) were found to be sup- 
ported by more than one sequence read and 87 clusters 
(26.7%) were found to be present in more than one 
library. Among these 87 clusters, 56 clusters (64.4%) 
were found in both normal and tumor tissues. This indi- 
cates that a majority of these putative insertions are not 
associated with tumorigenesis and/or cancer progres- 
sion. In addition, a library derived from a normal brain 
tissue contributed 159 clusters (48.6% of the total 327 
clusters) with 692 sequence reads (39.3% of 1,762 total 
sequence reads), while a library derived from a relapsed 



aggressive ependymoma only contributed 16 clusters 
with 23 sequence reads. Based on the difference in num- 
ber of sequence reads generated from each library, we 
normalized - for each library - the number of putative 
insertions that were identified, to the number of Alu 
repeats that were successfully mapped to the reference 
genome. No significant difference was observed in this 
ratio between normal and tumor tissues (p = 0.34, t- 
Test). 

To investigate whether these putatively new Alu inser- 
tions had been identified in previous studies, we 
extracted 1,763 and 795 known polymorphic Alu ele- 
ments from dbSNP (The Single Nucleotide Polymorph- 
ism database, NCBI) [33] and dbRIP (Database of 



Table 1 327 clusters comprised in 1,762 sequence reads. 



Sample ID^ 


NCI 


NC2 


PA1 


PA2 


PA3 


PA4 


PAS 


RL 


Sum 


sequenceRead_mapped 


245,825 


460,438 


256,014 


238,633 


336,103 


283,730 


276,505 


245,600 


2,342.848 


genomeRegion_mapped 


31,871 


47,071 


31,683 


33,760 


41,485 


33,937 


33,052 


36,957 


289,816 (140,865**) 


sequenceRead_Alu insertion 


56 


692 


78 


216 


453 


203 


23 


41 


1,762 


genonneRegion_Alu insertion 


28 


159 


36 


61 


113 


62 


16 


23 


498 (327**) 


Ratio (genomeRegion_Alulnsertion/ 
genomeRegion_mapped) 


0.09 


0.34 


0.11 


0.18 


0.27 


0.18 


0.05 


0.06 





*NC1: normal cerebellum; and NC2: normal 4^*^ ventricle lining tissue; PA1, PA2, PAS, PA4, and PAS: primary ependymoma tumor; RL: ependymoma tumor 

relapsed from PAS. 

*^Number of non-redundant clusters. 
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Retrotransposon Insertion Polymorphisms) [34], respec- 
tively. This analysis revealed that 316 of the 327 clusters 
were novel, i.e. they corresponded to yet undocumented 
de novo retrotransposition events. The putative integra- 
tion sites of 140 of such clusters (42.8%) were found to 
localize to intronic regions, except for one, which 
mapped to the 3'-UTR of TOMM40, a gene that codes 
for the translocase of the mitochondrial outer mem- 
brane (TOM) complex. We further analyzed these genes 
with NCBI's DAVID functional annotation tool to 
examine whether any specific gene category was more 
likely to harbor these Alu insertions. One hundred 
thirty-two genes were found annotated in the NCBI 
database. Compared to all genes annotated in the 
human genome, no significant enrichment was identified 
for this set of 132 genes in terms of biological process, 
cellular localization or molecular function (Additional 
File 2, Table S2). 

Verification of recent Alu insertions 

To validate the evolutionarily recent Alu de novo retro- 
transposition events identified in this study, we ran- 
domly selected twenty-one genomic loci encompassing 
such putative new Alu insertions. For each genomic 
locus, we designed primers based on the upstream and 
downstream sequences surrounding the predicted inte- 
gration sites. With these primers, the PGR products 
were expected to be -120 bp (without Alu insertion) or 
-420 bp (with Alu insertion). Due to the diploidy of the 
human genome, three kinds of PGR results were 
expected: (1) hemizygous Alu insertion: PGR products 
of two different sizes were expected, one fragment with 
the Alu insertion and another without it (spanning -420 
bp and -120 bp, respectively); (2) homozygous Alu 
insertion: only one PGR product was expected, this frag- 
ment containing an Alu element (spanning -420 bp); 
(3) nulizygous Alu insertion: no Alu insertion was pre- 
sent in either homologous chromosome, hence just one 
small PGR product (spanning -120 bp) was expected. 

The Alu insertions were successfully verified for forty- 
seven out of forty-nine cases representing nineteen geno- 
mic loci (Figure 2). To ensure that the regions amplified 
by PGR were indeed new Alu insertions, for each locus, 
PGR products were cloned and sequence-verified. The 
sequences representing these nineteen genomic loci were 
submitted to GenBank. Their accession numbers are: 
[HQ709117, HQ709118, HQ709119, HQ709120, 
HQ709121, HQ709122, HQ709123, HQ709124, 
HQ709125, HQ709126, HQ709127, HQ709128, 
HQ709129, HQ709130, HQ709131, HQ709132, 
HQ709133, HQ709134, HQ709135]. Fourteen out of the 
nineteen insertion events were predicted to occur in more 
than one individual. Interestingly, we found that nine out 
of these fourteen insertions were hemizygous in all 



individuals examined - i.e., the Alu insertion only occurred 
in one of the two homologous chromosomes. The remain- 
ing five insertions were hemizygous for some individuals 
and homozygous for others. From a total of forty-seven 
Alu insertions, thirty-six were found to be hemizygous 
and eleven were found to be homozygous. The fact that 
the majority of the insertions have remained in hemizygos- 
ity in the genome may be interpreted as suggestive of their 
recent evolutionary origin. However, that will remain spec- 
ulative until populational studies are performed. 

Genomic features and sequence characteristics of Alu 
elements and their flanking sequences 

It has been shown that polymorphic Alu elements and 
their flanking sequences may share some distinct sequence 
features [34,35]. The Alu transcripts derived from the ones 
with conserved structure would interact productively with 
SRP9/15 host proteins and gain the ability to retrotran- 
spose [12]. The AluY subfamily and its variants Ycl, Yc2, 
Ya5, Ya5a2, Ya8, Yd8, Yb8, and Yb9, are the ones consid- 
ered to be very active due to the conservation of its struc- 
ture. To conclude the analysis, we classified the Alu 
elements identified in this study according to its family of 
origin. We found that the new insertions identified in this 
study belong to the relatively recent family of AluY ele- 
ments or to the subfamilies AluYaS, AluYbS, AluYb9, and 
AluYg6. It has also been shown that the occurrence of a 
longer poly(A) tail might facilitate Alu retrotransposition 
[35]. Our analysis revealed that all twenty-two new Alu 
insertions that were sequence verified in this study have 
an A-tail that ranged from 11 to 45 nucleotides, with an 
average length of 29 bp. 

Alu retrotransposition is facilitated by LINE-1 ele- 
ments. LINE elements encompass two open reading 
frames, namely ORFl and ORF2P. ORFl encodes a 
non-specific RNA binding protein, and ORF2P encodes 
an endonuclease and a reverse transcriptase. During the 
process of retrotransposition ORF2P cleaves genomic 
DNA at a degenerate consensus sequence (TT/AAAA). 
Accordingly, the presence of a TT/AAAA sequence 
motif in the 5'-flanking region seems essential for Alu 
insertion [5,36,37]. The Alu insertion site is generated 
by a single-strand break that occurs in the target DNA 
made by ORF2P. The mechanism of Alu insertion is 
called Target Primed Reverse Transcription (TPRT) 
[8,37]. Indeed, we were able to document the occur- 
rence of this sequence motif - either a perfect match or 
a highly similar sequence - in the 5' flanking regions of 
all new Alu insertions that were sequence-verified in 
this study. For the nineteen Alu insertions identified in 
this study, the characteristic sequence features of Alu 
and flanking sequences are summarized in Table 2. 

In addition to the Alu sequence itself, the genomic 
sequence adjacent to the recent Alu insertions 
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Figure 2 PCR validation of putative Alu insertions (a-g). The Alu insertions were sorted based on genomic coordinates. The Alu insertions 
were named All through AI21. N-normal brain tissue DNA; El, E2, and E3-brain tumor tissue (ependymoma) DNA from different individuals; P 
and R- ependymoma DNA, P is primary and R is relapsed tumor from the same individual. 



encompass at least two typical sequences. As a hallmark 
of a recent retrotransposition event, the sequences 
immediately flanking the Alu elements corresponded to 
short direct repeats, ranging from 4-17 nucleotides. The 
insertion mechanism generates direct target site duplica- 
tions (TSDs) flanking the newly inserted element. These 
TSDs have variable length and are highly suggestive of 
LINE mediated endonucleolytic cleavage [12,38]. Such 
short direct repeats, also called AT-rich target site 
duplications, were present in 19 of the sequence- verified 
genomic loci (Table 2). 

Methylation status of recent Alu elements 

All sequences generated in our previous studies, encom- 
passing Alu elements and their 5' flanking sequences. 



were derived from bisulfite converted genomic DNA 
[31,32]. Due to the high frequency of C-to-T transitions 
in CpG dinucleotides of Alu repeats caused by deamina- 
tion of the methylated cytosines, in the absence of a 
reference genomic sequence, one cannot determine the 
methylation status of a novel Alu insertion by this 
method. Hence, to examine the methylation pattern of 
the newly integrated Alu elements, we aligned the 
sequences generated in this study for nineteen of such 
Alu elements with their bisulfite converted sequences 
from our previous studies [31,32] (Additional File 3, Fig- 
ure SI). Our results showed that the recently inserted 
Alu elements are heavily methylated, with an average 
methylation level of 90.7%; this is similar to the average 
methylation level observed for evolutionarily young non- 
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Table 2 Sequence features of newly Inserted Alu repeats. 



Sample 


Chromosome 
coordinate 


Alu 
subfamily 


TSD 


Putative 
cleavage 
site 


Poly-A sequence 


AI-1 


chrl 5:61 21 6453- 
61216621 


AluY 


AAGAAATGTCT 


™a 


CTCAAAAAAAAAAAAAAAGAAAAAAAAAAAAGAAAAAAGAAAT 


AI-2 


chr5:l 39595078- 
139595241 


AluYb8 


TAAATOCAGA 


™aa 


ctcaaaaaaaaaaaaaaaataaataaataaataaa™ 


AI-3 


chr4:41 598260- 
41598327 


AluY 


AAGTACATGTGG 


TGGAA 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAGT 


AI-4 


chrl 3:23662855- 
23663027 


AluYa5 


CATCTG 


™aaa 


CTCCGTCTCAAAAAAAAAAAAAACAAAAAAAAAAACAAAAAAAAAAAACATCT 


AI-5 


chrl 5:281 79309- 
28179438 


AluY 


ATAAAACATGGTCTG 


TATAAAA 


CTCAAAAAAAAAAAAAAAATAAAAAAAAATAAATAAAAAAATAAAACAT 


AI-6 


chrl 2:32076361- 
32076491 


AluYg6 


GAAATAATOATCT 


TGAAA 


CTCAAAAAAAAAAAAAAAAAAAAAGAAATAAT 


AI-7 


chrl 1:1 30675880- 
130675924 


AluY 


AAAAAGAAGC 


™aaa 


CTCAAAAAAAAAAAAAAAAAAAAGAAGCA 


AI-8 


chr5:l 41 758572- 
141758694 


AluYb8 


AAAAATGGGGA^ 


™aaa 


CTCAAAAAAAAAAAAAAAAAATGGGGA 


AI-9 


chrl0:107891481- 
107891638 


AluYg6 


CGTGTGCTC 


™aaa 


CTCAAAAAAAAAAACGTGTG 


AI-10 


chrl 0:72605338- 
72605440 


AluYg6 


AAGAAGGTA 


TAA 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAGGT 


AI-11 


chr2:48276482- 
48276601 


AluYb8 


AGAAATOAAATGCA 




CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGAAAT 


AI-1 2 


chr5:l 671 6576- 
16716677 


AluYg6 


AAGAAGTATGACAG 


TAA 


CTCAAAAAAAAAAAAAAAAAAAAGAAGTAT 


AI-1 3 


chrl 2:245 18543- 
24518646 


AluY 


AAAAAAGTATOATCA 


™aaa 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGTAT 


AI-14 


chr6:57403535- 
57403610 


AluY 


TCCTA 


TAAT 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATOC 


AI-1 5 


chr2:9888790- 
9888862 


AluYb8 


CACACCCGTG 


TAA 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACACAC 


AI-16 


chr9:1631754- 
1631884 


AluYb8 


AAGAA 


CAAA 


CTCAAAAAAAAAAAAAAAAAAAAAGAAAACA 


AI-1 7 


chr4:l 392251 39- 
139225274 


AluY 


GAG! 1 1 1 lAAALAlLl 


™aa 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGGAGT 


AI-1 8 


chr2:26623669- 
26623732 


AluYb8 


AAAATCAGTOTOC 


™aaa 


CTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATCAG^ 


AI-1 9 


chr9:37594172- 
37594310 


AluY 


AAGAAGTAGATATGG 


TAA 


CTCCAAAAAAAAAAAAAAGAAG 


AI-20 


chr6:99872263- 
99872632 










AI-21 


chr2:1451 75223- 
145175483 











polymorphic Alu elements [31,32]. We further examined 
the methylation status of the two important promoter 
regions inside the Alu elements, the A and B boxes. Alu 
elements have a bipartite structure, which is similar to 
that of tRNA elements. It has been shown that the A 
box is responsible for determining the strength of the 
Pol III promoter and the B box is important in enabling 
transcription [30,39]. Also, deletion of the B box 
sequence completely abolished transcription of the ele- 
ments, while deletion of the A box reduced the 



efficiency of transcription [40]. In almost all cases, these 
promoter sequences were methylated (Additional File 3, 
Figure SI). This result suggests that transcription of 
most newly inserted Alu elements is suppressed by 
DNA methylation. 

Interestingly, we found one Alu element at 
chr9:37594172-37594310 with a completely unmethy- 
lated 5'-end (AI19, Additional File 3, Figure SI). Since 
amongst all Alu elements chosen for verification this 
was the only element found to be unmethylated, and 
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also because only two bisulfite sequence reads had been 
previously generated for this element [31,32], we 
designed bisulfite PGR primers to amplify the entire Alu 
element including the two flanking CpG dinucleotides 
(Figure 3). Indeed, the 5'-end of this newly inserted Alu 
element was found to be completely unmethylated while 
its 3'-end exhibited some degree of DNA methylation. It 
is noteworthy that the 5'-flanking CpG site was comple- 
tely unmethylated, and the 3'-flanking GpG site was 
completely methylated. Importantly, the 5' terminal 
nucleotide of this newly inserted Alu element mapped 
1,576 bp downstream from a GpG island and 1,674 bp 
downstream from the transcription start site of the 
TOMM5 gene. This result suggests that the methylation 
status of this Alu element is under the influence of the 
epigenetic environment surrounding its insertion site. 
Since this Alu insertion was found to be in homozygos- 
ity, i.e. it was present in the two homologous chromo- 
somes, we were not able to investigate whether the Alu 
insertion exerted any influence on the methylation sta- 
tus of GpG dinucleotides flanking the Alu element. To 
confirm our hypothesis that the methylation of the Alu 
element is under the influence of the GpG island, we 
ascertained the methylation status of a fragment 
(chr9:37592324-37592701) corresponding to the 5' term- 
inal 377 bp of the GpG island, and also of the Alujo ele- 
ment flanking the 3 end (chr9:37594745-37595002) of 
the newly identified Alu element that was partially 
methylated. Indeed, we found that this GpG island 



fragment was completely unmethylated while the AluJo 
sequence flanking the 3'end of the newly inserted Alu 
exhibited a methylation level of the order of 40%. Inter- 
estingly, this AluJo exhibited a pattern of methylation 
very similar to the pattern presented by the newly 
inserted Alu element (Figure 3). There are 3 Alu repeats 
and 1 MIR element localized between the newly inserted 
Alu and the GpG island. The methylation levels of these 
elements are indeed very low (Figure 3). 

We conducted similar analysis to two other genomic 
loci, chrl0:72605338-72605440 and chr2:48276482- 
48276601, which were randomly chosen. The Alu inser- 
tions on these two loci were found to be in hemizygos- 
ity. This allowed us to compare the methylation status 
of the alleles with and without the Alu insertion (Figure 
4). The sequencing results derived from bisulfite-PGR 
cloning demonstrated that both newly inserted Alu ele- 
ments were indeed heavily methylated, as anticipated 
based upon our previously generated high-throughput 
bisulfite sequencing data (AIIO and AIll, Additional 
File 3, Figure SI). In addition, for the two genomic loci 
examined, there was no methylation difference between 
the alleles with and without the Alu insertion in the two 
homologous chromosomes, nor was there a difference 
in the methylation statuses of the GpG dinucleotides 
flanking the chr2:48276482-48276601 Alu insertion site. 
Furthermore, the GpG dinucleotide that is immediately 
downstream of the chrl0:72605338-72605440 Alu inser- 
tion site was also found to be methylated. Due to the 
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Figure 3 Bisulfite PCR cloning and sequencing to validate methylation status of an unmethylated Alu insertion (chr9:37594172- 
37594310). Asterisk indicates the CpG dinucleotides that are flanking the Alu element; the scheme shows the relative location of T0MM5 and 
the CpG island in relation to the Alu All 9 insertion (USCS Genome Bioinformatics). a) methylation status of a downstream AluJo (sequence 
coordinates: chr9:37594745-37595002) near the newly inserted Alu element; b) newly inserted Alu element and its methylation status; c) 
methylation status of 2 CpGs upstream of the newly inserted Alu; d), e), f), and g) methylation statuses of 3 Alu repeats and 1 MIR element 
localized between the newly inserted Alu and the CpG island, respectively, AluSx, AluJo, MIRb, and AluSx; h) methylation status of the 5'end of a 
CpG island located 1,576 bp (sequence coordinates: chr9:37592324-37592701) upstream from the newly inserted Alu element. Note that the 
T0MM5 transcription unit is in opposite orientation to that of the newly inserted Alu element. The methylation levels for a, b, c, d, e, f, g, and h 
were 40%, 33.7%, 79.1%, 4.2%, 0%, 3%, 32%, and 0.6%, respectively. 
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Figure 4 Methylation statuses of two pairs of hemizygous alleles, i.e. before and after Alu insertions, a) chrl 0:72605338-72605440 locus; 
b) chr2:48276482-48276601 locus. Schemes on the left side represent the allele not containing the Alu insertion, while the figure on the right 
side represents the allele in which the Alu element inserted. The blue arrow indicates the probable site of Alu integration; CG and asterisk 
indicate the CpG dinucleotides that are flanking the Alu element; red bar indicates the Alu element that was inserted. 



low CpG density of its 5' flanking genomic sequence, no 
methylation data were derived for the region upstream 
of chrl0:72605338-72605440. To identify methylation 
differences among samples, we calculated the methyla- 
tion level of all mapped Alu elements, and also that of 
the structural variants present in the 19 loci verified. 
This analysis revealed no methylation differences among 
tissues (Additional File 4, Table S3). 

Discussion 

Recent studies demonstrated that major structural var- 
iants in the human genome are derived from retrotran- 
sposons, Alu elements in particular [16,41]. Due to the 
extensive sequence homology that exists among young 
Alu repeats, the identification of such structural variants 



remains a challenging task. To date, a total of 2,558 
polymorphic Alu retrotransposons have been reported 
to occur in human populations, 1,763 of which have 
been deposited in dbSNP and 795 in dbRIP. In this 
study we implemented a computational pipeline to iden- 
tify recent Alu insertions, and examined the methylation 
status of the newly inserted Alu retrotransposons and 
their flanking sequences. At the time we developed this 
strategy the Genome Sequencer FLX System was the 
most suitable alternative available, given the greater 
length of the sequence reads that it generates, and the 
fact that sequences would encompass an Alu repeat and 
would be derived from bisulfite-converted genomic 
DNA. Altogether, the longer reads generated with the 
FLX System greatly facilitated their mapping back to the 
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reference genome sequence. Notwithstanding this 
advantage, however, we anticipate that our approach 
may be adapted to take advantage of competing next 
generation sequencing platforms that have a higher 
throughput and that can now generate sufficiently long 
sequence reads. 

Using this strategy a total of 327 putative Alu ele- 
ments were identified. We found that 42.8% of their 
insertion sites fell within intronic regions, while one 
integration site mapped to the 3'-UTR of the TOMM40 
gene. TOMM40 is a component of the preprotein trans- 
locase complex of the outer mitochondrial membrane, 
which consists of at least 7 different proteins (TOMM5, 
TOMM6, TOMM7, TOMM20, TOMM22, TOMM40, 
and TOMM70). These results are consistent with pre- 
vious studies indicating that Alu retrotransposons tend 
to be inserted within intragenic regions [1,42]. 

Out of the twenty-one insertion events that were ran- 
domly selected for validation analysis, nineteen were 
successfully verified. A limitation of the Alu-anchored 
bisulfite PGR approach that needs to be acknowledged 
is the fact that only 5' flanking sequences are obtained. 
The right arm of the Alu retrotransposons and their 3' 
flanking sequences are not represented in the sequence 
reads that are generated. Hence, in order to design pri- 
mers for the validation experiments, we used the refer- 
ence sequence of the human genome as source of 
putative 3'-flanking sequences for the Alu insertions. 
Accordingly, it is possible that the two cases that could 
not be verified may have been caused by the utilization 
of an incorrect 3' flanking sequence for primer design. 
Notwithstanding this limitation, the lowest estimated 
accuracy for the analysis pipeline that we have imple- 
mented in this study for the identification of de novo 
Alu retrotransposition events would be of 90.5% (19/21). 

The sequence features (TSD, TT/AAAA cleavage 
sequence, and A-rich Alu tail) that are typically 
observed in newly inserted Alu elements constitute hall- 
marks of retrotransposition [5,10,36,37]. Indeed, further 
analysis of the aforementioned nineteen PCR-cloned Alu 
elements and flanking sequences revealed the presence 
of both the TSD and TT/AAAA sites. Alu A-tails seem 
to be an important factor to enable Alu element retro- 
transposition [4,35]. Roy-Engel et al. reported that the 
average A-tail length of active Alu elements is 26 [35]. 
Consistent to their finding, the Alu A-tail sizes of the 
Alu elements described in this study ranged from 11-45 
with an average of 29 bp. 

Most cancer genomes are characterized by localized 
hypermethylation as well as by global hypomethylation 
[43,44]. This hypomethylation process may enable tran- 
scription and de novo retrotransposition of Alu elements 
which, in turn, may lead to genome instability [45] . Our 
previous study demonstrated that the methylation level 



of Alu elements decreased in ependymomas, and most 
significantly in recurrent tumors [31]. To examine 
whether some Alu insertions represented somatic events 
limited to recurrent ependymomas, which could have 
occurred in consequence of the loss of DNA methyla- 
tion, we generated and compared PGR products from 
ten genomic loci in primary and in recurrent tumors 
derived from one individual. The same results were 
obtained in all ten genomic loci. In addition, five of the 
ten Alu insertions were also found in other individuals. 
These results suggested that such validated Alu inser- 
tions most likely represent germ-line rather than 
somatic events. 

In this study, in addition to identifying structural var- 
iants in the genome of 6 individuals, we investigated 
epigenetic variations that might result from de novo ret- 
rotransponsition events. The Alu elements identified in 
this study were heavily methylated, as it was previously 
shown by high-throughput bisulfite sequencing and 
herein validated by cloning and sequencing analyses. 
The analysis of methylation throughout the mapped 
Alus and among the 19 loci verified revealed that there 
were no methylation differences among tissues (Addi- 
tional File 4, Table S3). This result indicates that at least 
by the time these DNA samples were obtained most of 
the newly inserted Alu elements were already transcrip- 
tionally repressed. This finding is further supported by 
the fact that the promoters of the Alu elements, i.e. 
their A and B boxes, were found to be methylated. 
However, there was one exception. We found an Alu 
insertion that was partially unmethylated 
(chr9:37594172-37594310). Interestingly, the insertion of 
this Alu element occurred 1,576 bp downstream from a 
CpG island and 1,674 bp downstream from the tran- 
scription start site of TOMM5, a gene encoding the 
translocase of the outer mitochondrial membrane 5. 
With a completely unmethylated promoter (both the A 
box and the B box were unmethylated), it is conceivable 
that this Alu element may have remained transcription- 
ally active and hence have served as source for addi- 
tional retrotransposition events. Another interesting 
finding was that a CpG island that is upstream of the 
element - i.e., that of TOMM5 - may be influencing the 
methylation pattern of this Alu repeat. Indeed, the 
methylation status of the CpG island was similar to that 
of the 5' end sequences of this Alu repeat, i.e. both were 
unmethylated. It would be interesting to explore the 
functional impact of this particular Alu on the nearby 
TOMM5 gene. Additionally, 3 Alu repeats and 1 MIR 
element that are localized between the newly inserted 
Alu repeat and the CpG island were found to exhibit 
very low methylation levels. Such striking pattern of 
DNA methylation may indeed be an indication of the 
influence exerted by the adjacent CpG island. It is also 
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possible that other epigenetic factors might be affecting 
the methylation statuses of these Alu elements, such as 
nucleosome positioning. Two previous studies have 
reported the influence of nucleosome positioning, within 
and around Alu element, in Alu activity [46,47]. Accord- 
ingly, it is noteworthy that an Alujo that is localized 
downstream of this newly inserted Alu exhibits a similar 
pattern of DNA methylation, i.e. its 5' half is unmethy- 
lated while its 3' half is methylated. In our previous 
study [32], we found that genomic localization has a 
profound impact on Alu methylation status. In this 
study, the identification of both methylated and 
unmethylated Alu elements provided additional support 
to there being a positional effect on Alu methylation. 
Last, but not least, it is noteworthy that two of the 
novel Alu insertions herein reported map within or near 
genes encoding members of the preprotein translocase 
complex of the outer mitochondrial membrane, namely 
TOMM40 and TOMM5, respectively. It is conceivable 
that given their housekeeping function and ubiquitous 
expression pattern, hence commonly open chromatin 
status, these genes may be more vulnerable to uptake de 
novo retrotransposition events. 

To explore the epigenetic impact of Alu insertion on 
adjacent genomic sequences, we examined the methyla- 
tion statuses for two loci harboring hemizygous inser- 
tions, and - in one case - obtained the methylation 
patterns of CpG dinucleotides flanking the Alu insertion 
sites. Both alleles - irrespective of the presence of an 
inserted element - were found to be heavily methylated, 
and no significant epigenetic variation was observed in 
association with the presence of the additional Alu 
element. 

Conclusions 

In this work we have identified a few novel Alu inser- 
tions sites. We used DNA samples from normal and 
from tumor tissues, but the data obtained did not show 
any tissue preference for these insertions. More studies 
are highly desired to further scrutinize the functional 
aspects of structural variants in the human genome, 
including epigenetic variations that might arise in conse- 
quence of a de novo retrotransposition event. 

Methods 

High-throughput bisulfite sequencing datasets for Alu 
elements 

The high-throughput bisulfite sequencing data were 
derived from Alu-anchored bisulfite PGR libraries 
derived from tissues samples, including a normal cere- 
bellum, a normal 4^^^ ventricle lining, two primary non- 
aggressive, two primary aggressive and two recurrent 
ependymomas [31,32]. Briefly, genomic DNA is first 
digested with Alul restriction enzyme, ligated to 



adaptors and then subjected to bisulfite treatment. Bisul- 
fite treated DNA is amplified with adaptor and Alu-spe- 
cific primers, the latter targeting a large pool of CpG- 
rich Alu elements. Thus, each PGR product contains the 
5'end of an Alu element and its (most often) unique 
flanking genomic sequence, which makes it possible for 
each sequence to be unambiguously mapped to the 
reference human genome. Primary non-aggressive epen- 
dymomas are defined as primary tumors from patient 
free of disease progression for more than 4 years and 
primary aggressive ones are defined as primary tumors 
from patients with recurrent disease within 3 years or 
deceased of disease. 

Computational pipeline for the identification of recent 
Alu insertions 

To identify putatively recent Alu insertions, sequence 
reads rejected in previous studies were selected. Briefly, 
after removal of primer and adaptor sequences, 
sequences greater than 40 bp were aligned to the in 
silico bisulfite converted reference genome using multi- 
ple cycles of MegaBLAST. The word size used in Mega- 
blast was set to 100 for the first cycle, it was decreased 
by 20 for every cycle thereafter until the last, for which 
the minimum length of best perfect match was set to 
40. In addition, the identity percentage cutoff for a sig- 
nificant alignment was set to be 100 for the last cycle 
and 95 for all other cycles of Megablast [32]. The 
sequence reads that mapped to genomic loci within 10 
bp from an Alu element were considered as a putative 
recent Alu insertion. 

PGR, cloning, and sequencing 

For PGR primer design, the original (not bisulfite con- 
verted) DNA sequences flanking the predicted Alu 
insertion sites were extracted from the UGSG reference 
human genome, based on their genomic coordinates 
[48]. PGR primers were designed in the region sur- 
rounding the Alu insertion sites. PGR reactions were 
performed using HotStartTaq^ Plus Master Mix from 
QIAGEN. Each reaction was prepared as follows: 12.5 
(iL of HotStartTaq^ mix, 30 ng of DNA, 14 (iM of each 
primer, and enough water for 25 (iL. The PGR reactions 
were performed on a MJ Research machine (model PTG 
225). Reactions were subjected to an initial activation 
step of 95°G for 15 min, then by a denaturation step of 
94°G for 1 min, followed by 40 cycles of 1 min at 94°G, 
30 s at optimal annealing temperature, and 40 s at 72°G, 
followed by a final extension step of 10 min at 72°G. 
PGR product annealing temperatures (Tm) and primers 
used on each reaction are listed on Additional File 5, 
Table S4. After reactions were completed the amplified 
fragments were separated using 1.5% agarose gel electro- 
phoresis that was stained with ethidium bromide and 
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visualized using UV fluorescence system. Running was 
carried out until a good separation of bands was 
obtained. After separation in the 1.5% agarose gel the 
bands were excised off the gel and purified using a gel 
purification kit from QIAGEN, QIAquick^ PGR Purifica- 
tion Kit. The purified PGR products were cloned using 
the TOPO TA Gloning^ System from Invitrogen. 
Sequencing reactions for individual colonies were con- 
ducted at the Sequencing Gore Facility of the Ghildren's 
Memorial Research Genter of Northwestern University's 
Feinberg School of Medicine. 

Bisulfite PGR 

Bisulfite conversion of genomic DNA was performed 
with EZ DNA Methylation Gold kit (Zymo Research 
Gorporation, Irvine, GA) following the manufacturer's 
instructions. 300 ng of genomic DNA was treated and 
eluted with 10 [iL of elution solution. After this step, 
DNA from the chrl0:72275361-72275449 genomic locus 
was amplified using the pair of primers: 5'- GGA TTA 
AGT TTT TTT TTT GTT T -3' and 5'- GTA GAA 
AAA AAA ATA AGT GAT A -3'; the chr2:48129974- 
48130105 genomic locus was amplified using the pair of 
primers: 5'- GGT TAG GAT TTA AAA ATA AAA 
AAT GAA -3' and 5'- GTT TAA GAT TTA AAG GAA 
TGA GTT AG -3'. PGR reactions were prepared using 
the same reagents and conditions described above. The 
PGR program used was: activation step of 95°G for 15 
min, then by a denaturation step of 94°G for 1 min, fol- 
lowed by 40 cycles of 1 min at 94°G, 30 s at optimal 
annealing temperature (49°G for locus chrl0:72605338- 
72605440 and 42°G for locus chr2:48276482-48276601), 
and 40 s at 72°G, followed by a final extension step of 
10 min at 72°G. PGR amplified fragments were separated 
in 1.5% agarose and excised from it as described above. 

To amplify the 5'end of the GpG island and the 
sequence flanking the 3' end of the Alu element (AI19 
chr9:37594172-37594310) we used nested PGR. The pri- 
mers used for the GpG island were: external primers 
(TOMM5-eFor 5'- AAG TTG GGA GAA TTA GGA 
TGA TT -3' and TOMM5-eRev 5'- GTA ATT TTT 
ATA TAA GAA ATA TTA TTA AAA AGA AG -3'), 
internal primers (TOMM5-iFor 5'- GTA TTT TTA 
GAG TTA AGG GGT GT -3' and TOMM5-iRev 5'- 
GAG TTG AAA TGA AGT AAA TGA AAA G -3'). The 
primers used for the 3'Alu flanking sequence were: 
external primers (ch9Down-eFor 5'- TTT GTA GTG 
ATG TTG AAA GTA GTA AGA- 3' and ch9Down- 
eRev 5'- AAT ATA TAG GTT GGG TTT GGA AGT 
-3'), internal primers (ch9Down-iFor 5'- TTT ATT TTA 
GAT TGA GTT TTG TTT TGT -3' and ch9Down-iRev 
5'- GTT AAA GGG AAA AAT ATA AAA TTA GAA 
TAG -3'). The PGR program used was the same 



mentioned above, the TM temperatures were 51°G for 
external primers and 50°G for internal primers. 

Alu classification 

As it was mentioned above, the initial data used for 
this study were derived from Alu-anchored bisulfite 
PGR libraries. These libraries were constructed using 
primers specially designed to target the most recent 
and active Alu elements, which are the ones that 
belong to AluY family. To classify the Alu insertions 
verified in this study within the AluY family, we 
aligned the sequences generated in this study with 
those contained in the UGSG Genome browser. From 
this alignment, the element with the highest score was 
used to classify a newly identified Alu insertion into 
the AluY family. 

Additional material 



Additional file 1: Table 51. Putatively recent Alu insertions. Alu 
insertions identified in eight Alu bisulfite PGR libraries. 

Additional file 2: Table S2. Clusters of putative Alu insertions. Alu 
insertions and their gene annotation. 

Additional file 3: Figure SI. Methylation pattern of recent Alu 
insertions. To determine the methylation status, the sequences 
corresponding to the first half of Alu elements plus its 5' flanking regions 
[31,32] were aligned to the Alu element sequences generated in this 
study. NCI: Normal cerebellum and NC2: normal 4* ventricle lining 
tissue; PAl, PA2, PA3, PA4, and PAS; primary ependymoma tumor; RL; 
ependymoma tumor relapsed from PA3. 

Additional file 4: Table S3. Alu methylation level. Methylation levels of 
mapped and 19 non-mapped Alu elements. 

Additional file 5: Table S4. Validation of identified Alu elements. 
Primers designed for Alu elements validation. 
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